The fact that the class of NN functions is universal is almost vacuous: the basic idea is that if you allow a "neuron" for every point in your input space then you can mimic any function you like (i.e. each neuron handles a single input). Obviously such representations become arbitrarily large.
Which almost immediately suggests a solution to why NN learning works: the processes that produce the types of datasets humans are interested in are produced by (effectively) polysized networks and you can probably say things like the probability of recovering a polysize function from polysize samples is high.
My gut response was, I think, a less rigorous version of this: Humans care about tasks our own brains are good at. If NNs are in fact decent approximations of the way our brains work, then it makes sense they will be anomalously effective on this subset of the "actual" problem space.
You seem to be saying that ANNs are good at pattern recognition because BNNs are. That just defers it to the question why BNNs are good at it.
OTOH, the parent comment suggested we should draw our attention to the processes that produce the data and find correspondences to how NNs decompose it.
Which almost immediately suggests a solution to why NN learning works: the processes that produce the types of datasets humans are interested in are produced by (effectively) polysized networks and you can probably say things like the probability of recovering a polysize function from polysize samples is high.