Hacker News new | past | comments | ask | show | jobs | submit login
Basic Methods for Finding Zeroes and Mins / Maxes of Functions (demofox.org)
65 points by luu 13 days ago | hide | past | web | favorite | 9 comments





The author points out that Newton's method needs a good initial guess. This is especially true if you have a function that is not well approximated by a quadratic polynomial. If, for example, you want your optimization to converge to a nearby minima (and no explode off towards infinity) a very simple fix is to implement a maximum step size that limits how far the next x value is from the current one.

This technique is used when optimizing atomic structures, where the energy changes rapidly as a function of distance between the atoms. Newton's method (or more typically, one of its approximations, like L-BFGS) without a maximum step size would just blast all the atoms apart, but with a maximum step size, it works extremely well.

This technique does require some domain knowledge. That is, what step size is "big", which is not always easy to know.


BTW, this is basically how all neural networks work too.

Uhh, not all but maybe most in production. You can use any optimization technique you want on training the weights including things like evolutionary algorithms or simulated annealing which are entirely different from what's listed here. Evolutionary style methods may be SOTA for continuous control reinforcement learning problems... Consider how backprop or hill climbing or LBFG-S performs on something basic like cart pole

Evolutionary algorithms are SOTA for continuous control? Never saw a paper claiming that. Which one of them can even compare to soft actor-critic?

I believe gradient descent is the king everywhere interesting right now (e.g. CV, NLP, RL).



I have seen this paper. The blog post does not report, that they were able to train a better agent, than A3C. Only that ES allowed them to use more compute power to train in parallel.

Brent's method is a nice combination (maintaining the robustness of bisection and approaching the speed of Newton) of several of these methods: https://en.wikipedia.org/wiki/Brent%27s_method

Of course the roots need to be isolated/bracketed first, and for this one can use https://en.wikipedia.org/wiki/Sturm's_theorem


There are also bracketing methods, where you first have to isolate roots to an interval and then approximate the root within that interval. Regula falsi and secant methods are examples of these and in my experience especially regula falsi works very well.

I've been following this blog for a while and the author does a great job making complex math topics understandable for programmers. Recommended!



Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: