
The zen of gradient descent - peterkshultz
http://blog.mrtz.org/2013/09/07/the-zen-of-gradient-descent.html
======
alfiedotwtf
How do these and other variants compare to simple Hill Climbing with random
restarts, because I feel that gradient descent, simulated annealling etc are
just local searches within the local search space.

~~~
akssri
Nesterov's method attains super-linear convergence on Convex functions. They
don't really (if I remember right) work otherwise; but then this is okay,
since Convex problems are really the only ones we can solve in poly-time (esp.
if P/=NP and UGC is true).

SA/RR are global optimization methods which come with very little (if any?)
complexity guarantees.

~~~
alfiedotwtf
"on Convex functions"

Ah thanks, I thought I missed something. That makes total sense.

"SA/RR are global optimization methods"

RR is definitely global, but (depending on your cooling function I guess) I
would say SA is not.

~~~
akssri
It should be noted that things like momentum etc in DNN training are inspired
directly from Nesterov, and apparently work very well.

