backpropagation didn't get solved until the '80s, weirdly. before then people were using genetic algorithms to train neural networks.
and it was only in the last decade that the vanishing gradients problem was tamed.
my impression is that ML researchers were stumbling along in the mathematical dark, until they hit a combination (deep neural nets trained via stochastic gradient descent with ReLU activation) that worked like magic and ended the AI winter.
Right, and the practice of neural networks has significantly overshot the mathematical theory. Most of the aspects we know work and result in good models have poorly understood theoretical underpinnings. The whole overparamiterized thing for example, or generalization generally. There's a lot that "just works" but we don't know why, thus the stumbling around and landing on stuff that works
and it was only in the last decade that the vanishing gradients problem was tamed.
my impression is that ML researchers were stumbling along in the mathematical dark, until they hit a combination (deep neural nets trained via stochastic gradient descent with ReLU activation) that worked like magic and ended the AI winter.