Hacker News new | past | comments | ask | show | jobs | submit login

Optimizing a neural network is not really minimization. Sampling the training data in batches introduces a stochastic element, so a better analogy is perhaps something like Langevin dynamics. Decreasing the learning rate would then very roughly correspond to decreasing temperature, as in simulated anealing. (These analogies are only approximate.)



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: