Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Newtons method and linear algebra. High school math, basically.


>For instance, local entropy, the loss that Entropy-SGD minimizes, is the solution of the Hamilton- Jacobi-Bellman PDE and can therefore be written as a stochastic optimal control problem, which penalizes greedy gradient descent. This direction further leads to connections between variants of SGD with good empirical performance and standard methods in convex optimization such as inf- convolutions and proximal methods.

You clearly didn't read the article. What are you commenting on, then? It seems to be about your own understanding, since it's certainly not about the article.


> Newtons method

Where are you seeing Newton's method? I didn't think second order information was available for typical systems in statistical machine learning.


my understanding is that the issue is that the full Hessian of the loss is too expensive to compute at each step for the relative size of the increase in learning speed


Yeah I think that's why quasi-Newton methods like BFGS have been developed.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: