Hacker News new | past | comments | ask | show | jobs | submit login
How to visualize decision trees (explained.ai)
343 points by parrt 9 months ago | hide | past | web | favorite | 23 comments

Decision trees are the fundamental building block of gradient boosting machines and Random Forests™, probably the two most popular machine learning models for structured data. Visualizing decision trees is a tremendous aid when learning how these models work and when interpreting models. Unfortunately, current visualization packages are rudimentary and not immediately helpful to the novice. So, we've created a general package called animl for scikit-learn decision tree visualization and model interpretation.

This is cool. I like it, and will probably use it in my work, but it feels like there’s a lot going on. I don’t like how some of the final leaf nodes seem to be shown differently than the nodes higher up. Sometimes different chart types, sometimes reversed axes. I would also reccomend use of swarm plots for showing your regression scatter plots. Swarm plots are sexy, but not in the laughably uncomfortably way of the very similar violin plot.

Yep, the leaves are predictor nodes whereas internal nodes are decision nodes. They are doing different things so we figured we should show them using different visualizations.

Wow, I wondered why you put a TM on Random Forests. I guess it is trademark of Salford Systems, which is kind of weird. Maybe we can just call them random forests and ignore that.

> Maybe we can just call them random forests and ignore that.

Legally, yes, you can, as the use is not mandatory:


> Although owners of trademarked names may suggest otherwise, publishers are not obligated to denote the trademark status of a name when that name is mentioned in text. Authors representing trademark owners frequently feel obligated to use the trademark or registered-trademark symbol (™ or ®) after the first mention of their product names but often do not use these symbols consistently to indicate the trademark status of other names not owned by their particular sponsor or employer.

The people who own the trademark may feel obligated to use those marks, but nobody else ever is.

There's a lot of "folk law" (that is, urban legends repeated by the ignorant) surrounding this concept, so if you think I'm wrong, please do yourself and the rest of us a favor and research good cites to show that there's actual law saying I'm wrong. Thanks.

I'm often guilty of this too - but we really should put the (tm) there. It's nice that they made code of the algorithm publicly available and all they ask is that we respect their trademark in return. I think that's more than fair. :)

(I discussed this a few years with the co-inventor of random forests, Adele Cutler, and she confirmed that this is something that she wants to see happen.)

Are algorithms patentable? Last I checked in US they were, copywritable?

Not the answer to your question, but in case it helps anyone: trademarks are unrelated to patents. You can use a random forest but you can not call them “random forest”. “Aleatory jungle” is fine, though.

"stochastic treeset". Sounds way more scientific, which can be required to convince a pointy-hair boss. "Random" forest sounds... well, I can flip a coin too, how is that going to solve my problem?

For the same reason, "naive" bayes classifier are very hard to sell, to the point I stopped naming them and now just tell "a very fast machine learning algorithm", unless specifically asked.

There's a nice interactive version of a decision tree diagram here, in the section "Growing a Tree".


Indeed. They were the inspiration for this visualization. I wanted to do something for my book with Jeremy Howard https://mlbook.explained.ai/ and those guys show the way, but of course it isn't a general library. Love that r2d3.us page.

This is the first time ever when the website is doing something when I scroll and I'm not mad.

I was particularly impressed with the continuous scroll which lets you go through the animations frame-by-frame, so to speak.

I have a model trained with XGBoost Java. Can I take the model file and read it with scikit-learn to visualize it with this library?

Good to see others looking into tree model viz. I've done work with larger scale tree visualizations and found you quickly run out of space. I wound up using interactivity to reveal branch level info, dynamically pruned the tree based on train support, and I used a more sophisticated layout technique to pack more info in. https://www.google.com/amp/s/blog.bigml.com/2012/01/23/beaut...

FWIW visualizing trees like that helps spot problems really quickly. Overfitting behavior typically involves overusing a certain field, or growing long and relatively narrow branches.

Thanks for that link. Super useful. Looks like BigML uses same layout I did for ANTLR parse trees. Really packs stuff in; e.g., https://cdn-images-1.medium.com/max/1760/1*k0mO4kJyQvPCyyev0...

Yeah the general algorithm is "Reingold-Tilford" with some tweaks from Buchehim.


Really good algorithm to have in an arbitrary visualization toolkit.

Just a note that dtreeviz works cross-platform now! Mac, Windows, Linux. "pip install -U dtreeviz" See more at https://github.com/parrt/dtreeviz

If you model the decision tree as a directed graph, there’s no reason you couldn’t export it into a Doom/Quake level or even Minecraft or Roblox

Heh that’s a cool idea. Fly through the tree like a maze

Not sure about the choice of pie chart as the default leaf format (humans are bad at guessing proportions from pie charts) but otherwise it does look great and convey the information efficiently.

howdy! We use a pie chart for classifier leaves, despite their bad reputation. For the purpose of indicating purity, the viewer only needs an indication of whether there is a single strong majority category. The viewer does not need to see the exact relationship between elements of the pie chart, which is one key area where pie charts fail.

Good,very informtion distilled from the drawing

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact