
Gradient Boosting Explained in 3D - sshb
https://arogozhnikov.github.io/2016/06/24/gradient_boosting_explained.html
======
enthdegree
I hate to be a curmudgeon but is machine learning really necessary for a
wavelet decompose?

Can anyone give an example where this enables you do to things that couldn't
be done better with other techniques?

~~~
romaniv
Can wavelet decompose be trained for classification of feature vectors with
many dimensions?

~~~
enthdegree
Yes that is what they are for. In this case you probably don't even really
need wavelets and you can just use a Fourier transform

~~~
romaniv
Fourier transform is not a classification mechanism. You can transform
something into frequency domain, but that doesn't inform you which frequencies
are most relevant in classifying samples.

------
minimaxir
Note that chart appears to not work on iOS, as Plot.ly is erroneously snowing
a "Webgl not supported" message (which is an error on their end, as the
official website for the library shows the same issue. Issue on GitHub:
[https://github.com/plotly/plotly.js/issues/280](https://github.com/plotly/plotly.js/issues/280))

That's a shame, as their API is pretty good, as this demo illustrates.

~~~
arogozhnikov
+1ed the issue. I was sure that's a problem on Safari side.

------
artursapek
I really need to play with the 3d canvas context API...

~~~
corysama
Starting point
[https://webglfundamentals.org/](https://webglfundamentals.org/)

Most popular library: [http://threejs.org/](http://threejs.org/)

Don't miss [https://acko.net/blog/mathbox2/](https://acko.net/blog/mathbox2/)
and [https://aframe.io/](https://aframe.io/)

~~~
artursapek
Thanks!

------
ced
I haven't read much on gradient boosting, so questions:

1\. Where is the gradient? This explanation makes it sound like a straight
Generalized Additive Model.

2\. In fact, the explanation makes it sounds worse than random forests.
Wouldn't it quickly overfit? Where does the boosting come into play?

~~~
sforzando
These helpful, well-written slides help explain where the "gradient" comes
into "Gradient Boosting":

[http://www.ccs.neu.edu/home/vip/teach/MLcourse/4_boosting/sl...](http://www.ccs.neu.edu/home/vip/teach/MLcourse/4_boosting/slides/gradient_boosting.pdf)

The gist of it is: when you add a new decision tree that fits to the residual
error, this new tree is fitting to the negative gradient of the loss function
(ie training error). Thus, adding the new decision tree to your existing
ensemble takes a gradient-descent step that seeks to minimize the loss
function (ie training error).

Boosting comes in because the model is combining several weak learners/models
(individual trees) into a strong learner (ensemble of trees). Each individual
tree breaks up the input space into piecewise-constant regions that best
approximate the target function. This representation will incur some error -
thus, a new tree is fit to minimize the error over the entire input space, ie
by breaking up the input space into piecewise-constant regions, etc.

So, it's boosting not in the traditional Adaboost sense: where the final model
is a linear combination of "dumb" classifiers. Instead, I'd liken it more to a
cascade method: each tree T_{n} seeks to fix the errors from the previous tree
T_{n-1}:
[https://en.wikipedia.org/wiki/Cascading_classifiers](https://en.wikipedia.org/wiki/Cascading_classifiers)

There's actually a cool facial landmark detector that uses this same cascading
idea to train an extremely fast (and quite accurate) system. In essence, they
use a cascade of random forests (in a gradient-boosting framework) to detect
landmarks. The dlib library has a great implementation, along with a
pretrained model. I've used it in my research, and while not perfect, have
been satisfied with its results: [http://blog.dlib.net/2014/08/real-time-face-
pose-estimation....](http://blog.dlib.net/2014/08/real-time-face-pose-
estimation.html)

[http://www.cv-
foundation.org/openaccess/content_cvpr_2014/pa...](http://www.cv-
foundation.org/openaccess/content_cvpr_2014/papers/Kazemi_One_Millisecond_Face_2014_CVPR_paper.pdf)

~~~
ced
Those slides were very helpful, thank you.

