Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Does this essentially mean that any multi-layer RNN can be reasonably approximated by a 1-layer network (something like a perceptron) for the "playback" purposes, that is, for recognition / transformation, not learning?

This may have colossal practical implications, as long as the approximation stays good enough.



Hmmm, I think that's not precise and my use of "architecture" was misleading.

If we're thinking in terms of "universal aproximators", an RNN is a way to make a sequence of approximate functions for a sequence of inputs.

But it's still a sequence of functions, not a single function.

For a 1 layer network to have the same ability as an RNN (take an unbounded amount of context) it would need to have infinite width which is a no-go.


I would be skeptical about thinking of networks this way without empirically verifying it yourself.

The only useful trick I’ve found like that, is that a stack of linear layers with no activation function is equivalent to a single larger layer. Sometimes it enables some clever optimizations on TPUs, since you want one of the dimensions to be a multiple of 128. (I haven’t actually used that trick, but it’s in my back pocket.)

But thinking of an entire model as a single layer seems strange. A single layer has to have some kind of meaning. To me, it means “a linear mapping followed by a nonlinear activation function.” So is the claim that there exists a sufficiently complicated activation function that approximates any given model? Because that sounds an awful lot like the activation function itself might be “the model”. Except that makes no sense, because activation functions don’t use model weights; the linear multiply before the activation does that.

So it quickly takes me in circles. I don’t have a good intuition for models yet though.


Wouldn't this one-layer network be a lot less "compressive" than the multi-layer net, and in some sense "duplicate" subnetworks in earlier layers?




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: