Hacker News new | past | comments | ask | show | jobs | submit login

The model in the paper isn't a "real" RNN due making it parallelizable, for same the reasons described in https://arxiv.org/abs/2404.08819 , and hence is theoretically less powerful than a "real" RNN (struggles at some classes of problems that RNNs traditionally excel at). On the other hand, https://arxiv.org/abs/2405.04517 contains a "real" RNN component, which demonstrates a significant improvement on the kind of state-tracking problems that transformers struggle with.



These are real RNNs, they still depend upon the prior hidden state, it’s just that the gating does not. The basic RNN equation can be parallelized with parallel prefix scan algorithms.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: