
Basic Methods for Finding Zeroes and Mins / Maxes of Functions - luu
https://blog.demofox.org/2020/03/17/basic-methods-for-finding-zeroes-and-mins-maxes-of-functions/
======
montecarl
The author points out that Newton's method needs a good initial guess. This is
especially true if you have a function that is not well approximated by a
quadratic polynomial. If, for example, you want your optimization to converge
to a nearby minima (and no explode off towards infinity) a very simple fix is
to implement a maximum step size that limits how far the next x value is from
the current one.

This technique is used when optimizing atomic structures, where the energy
changes rapidly as a function of distance between the atoms. Newton's method
(or more typically, one of its approximations, like L-BFGS) without a maximum
step size would just blast all the atoms apart, but with a maximum step size,
it works extremely well.

This technique does require some domain knowledge. That is, what step size is
"big", which is not always easy to know.

------
pixelpoet
Brent's method is a nice combination (maintaining the robustness of bisection
and approaching the speed of Newton) of several of these methods:
[https://en.wikipedia.org/wiki/Brent%27s_method](https://en.wikipedia.org/wiki/Brent%27s_method)

Of course the roots need to be isolated/bracketed first, and for this one can
use
[https://en.wikipedia.org/wiki/Sturm's_theorem](https://en.wikipedia.org/wiki/Sturm's_theorem)

------
lostmsu
BTW, this is basically how all neural networks work too.

~~~
Der_Einzige
Uhh, not all but maybe most in production. You can use any optimization
technique you want on training the weights including things like evolutionary
algorithms or simulated annealing which are entirely different from what's
listed here. Evolutionary style methods may be SOTA for continuous control
reinforcement learning problems... Consider how backprop or hill climbing or
LBFG-S performs on something basic like cart pole

~~~
lostmsu
Evolutionary algorithms are SOTA for continuous control? Never saw a paper
claiming that. Which one of them can even compare to soft actor-critic?

I believe gradient descent is the king everywhere interesting right now (e.g.
CV, NLP, RL).

~~~
Der_Einzige
[https://openai.com/blog/evolution-
strategies/](https://openai.com/blog/evolution-strategies/)

~~~
lostmsu
I have seen this paper. The blog post does not report, that they were able to
train a better agent, than A3C. Only that ES allowed them to use more compute
power to train in parallel.

------
WhiteNoiz3
I've been following this blog for a while and the author does a great job
making complex math topics understandable for programmers. Recommended!

------
jansan
There are also bracketing methods, where you first have to isolate roots to an
interval and then approximate the root within that interval. Regula falsi and
secant methods are examples of these and in my experience especially regula
falsi works very well.

