The model in the paper isn't a "real" RNN due making it parallelizable, for same... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

logicchains 3 months ago | parent | context | favorite | on: Were RNNs all we needed?

The model in the paper isn't a "real" RNN due making it parallelizable, for same the reasons described in https://arxiv.org/abs/2404.08819 , and hence is theoretically less powerful than a "real" RNN (struggles at some classes of problems that RNNs traditionally excel at). On the other hand, https://arxiv.org/abs/2405.04517 contains a "real" RNN component, which demonstrates a significant improvement on the kind of state-tracking problems that transformers struggle with.

robertsdionne 3 months ago [–]

These are real RNNs, they still depend upon the prior hidden state, it’s just that the gating does not. The basic RNN equation can be parallelized with parallel prefix scan algorithms.

Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact