Hacker News new | past | comments | ask | show | jobs | submit login

I have a question. They show that any given depth-ell network, computing F, is w.h.p. approximated by some subnetwork of a random depth-2ell network.

But there is a theorem that even depth-2 networks can approximate any continuous function F. If the assumptions were the same, then their theorem would imply any continuous function F is w.h.p. approximated by some subnetwork of a depth-4 network.

So what is the difference in assumptions, i.e. what’s the significance of F being computed by a depth-ell network? What functions can a depth-ell+1 network approximate that a depth-ell network can’t? I’d guess it has to do with Lipschitz assumptions and bounded parameters but would be awesome if someone can clarify!




The theorem you mention is true for networks whose widths tend towards infinity.

This paper assumes a nn is given with fixed width n and fixed depth l. The main result is that there exists a subnetwork of a nn with depth 2l and width polynomial in n and l that can approximate it arbitrarily well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: