
How to explain gradient boosting - parrt
http://explained.ai/gradient-boosting/index.html
======
parrt
Gradient boosting machines (GBMs) are currently very popular and so it's a
good idea for machine learning practitioners to understand how GBMs work. The
problem is that understanding all of the mathematical machinery is tricky and,
unfortunately, these details are needed to tune the hyper-parameters. (Tuning
the hyper-parameters is required to get a decent GBM model unlike, say, Random
Forests.) Our goal in this article is to explain the intuition behind gradient
boosting, provide visualizations for model construction, explain the
mathematics as simply as possible, and answer thorny questions such as why GBM
is performing “gradient descent in function space.” We've split the discussion
into three morsels and a FAQ for easier digestion. Written by Terence Parr and
Jeremy Howard.

~~~
nonbel
> _" The problem is that understanding all of the mathematical machinery is
> tricky and, unfortunately, these details are needed to tune the hyper-
> parameters."_

You don't need to understand anything about the math to run a random, or grid,
or bayesian optimization, or whatever search of the hyperparameter space.

~~~
parrt
True, people use a grid search, but I am always very uncomfortable using
things as black boxes. How does tree depth affect generality etc...?
Effectively using a model means understanding your tools, in my view, but easy
to get started w/o the math as you say!

~~~
nonbel
For example look at this tutorial about a regularization hyperparameter:

[https://medium.com/data-design/xgboost-hi-im-gamma-what-
can-...](https://medium.com/data-design/xgboost-hi-im-gamma-what-can-i-do-for-
you-and-the-tuning-of-regularization-a42ea17e6ab6)

Id think this is much more useful than anything about the math. How much can
you deduce thats described there from the math, isn't this all just figured
out by playing around with it?

~~~
parrt
The main point of this article is really to explain how gradient boosting
works and why. The math is really there to show what the algorithm looks like
in its general form. The Discussion of parameters was really just a bit of
motivation. Think of this as a good explanation of why it is performing
gradient descent in function space. That tends to be very hard to explain.

~~~
nonbel
I just disagree with the claim that understanding the mathematical machinery
is necessary for tuning hyperparameters. I doubt it is even helpful.

