You're right that the way we typically train Tree Ensembles creates a massive number of rules, the walk through Random Forest has more than 100,000 leaves per Decision Tree. Once we start grafting it the number of rules starts to vastly outnumber the amount of training data.
I have some follow up articles planned that will cover this in more detail, but the short answer is that I feel that we often jump to overly complex models up front without fully considering whether the accuracy/complexity tradeoff it worth it. Using Amalgamate I showed how I could have the number of rules without significantly increasing validation error (+5%). I believe that if we're careful using model sophisticated techniques (i.e. Boosting and dense/fully connected/tabular neural networks) then we should be able to create reasonably accurate models that are reasonably straight forward to explain.
I've been trying to work out where to put updates, so far I've been using GitHub & Twitter (both @wagtaillabs). I'll keep posting to HN as well (I just had two orders of magnitude more traffic than any other day).
I'd be more than happy for any suggestions on places where people could follow (I've thought about an email list, but I'm not sure how many people actually read emails any more).
(While I have you, would you mind adding an email address to your profile that we can contact you at? We do that sometimes when we want to invite a repost.)
Fantastic read! As a long term fan of RF, a lot of things immediately clicked and made sense. It's also a great new direction compared to SHAP-style explanations which most of the industry is using at the moment.
Yep, all the code is in there. I do have another piece of code that has all the algorithms in a single class (which makes it much easier to use), I'll double check that it's up to date and post that tonight.
That works to a point, but it doesn't necessarily find all the rules of the model. In the post I walked through a model with three training records (yellow, blue, red) which created six prediction boundaries. Half of the rules weren't covered by the training data, which makes them hard to find without an efficient algorithm to search out all possible rules. The risk of undiscovered rules is they may cause unexpected behaviour that leads to bad predictions - and if you haven't described the whole model then it will be impossible to know how many of these potentially bad predictions exist.
But, I apologize ! It's a bit pimped up compared to the one liner above, I think step 7 in section 4.3 is what I was thinking of :) I did laugh when I dug it out, as I have been working on the first bullet in the conclusion this week!