None of the parties mentioned actually deny the above equivalence. The reason ba...

None of the parties mentioned actually deny the above equivalence. The reason backprop is a popular idea in deep learning is because people started developing continuous models, where the output (and the error) was a continuous and differentiable function of the input and the weights, which allowed chain rule to be used to compute the gradients, which allowed one to use gradient descent methods. This shift from discrete units to continuous units was termed error backpropogation, and not just chain rule.