it might not be rigourous, but for standard, deep neural networks its intuitively obvious enough that you can reinvent the idea from scratch in your bedroom in a time before the internet just by getting a vague description of the idea and thinking about how it /could/ work.
proving convergence may be difficult, but its not particularly challenging to see why it happens imo. :/
a lot of the things the article points at are utterly irrelevant to the subject. e.g. sigmoid functions, depth of networks.
i think there are some bold baseless claims here - for instance linearly interpolating data points is pretty simple as a way to approximate a function, and its not hard to see how neurons can provide this with linear activation functions and some very naive back propagation. (e.g. evenly dividing error and correcting weights towards the correct result)
if this didn't converge /that/ would be surprising.