Optimizing a neural network is not really minimization. Sampling the training da...

Optimizing a neural network is not really minimization. Sampling the training data in batches introduces a stochastic element, so a better analogy is perhaps something like Langevin dynamics. Decreasing the learning rate would then very roughly correspond to decreasing temperature, as in simulated anealing. (These analogies are only approximate.)