Hacker News new | past | comments | ask | show | jobs | submit login

> The information cost of making the RNN state way bigger is high when done naively, but maybe someone can figure out a clever way to avoid storing full hidden states in memory during training or big improvements in hardware could make memory use less of a bottleneck.

Isn't this essentially what Mamba [1] does via its 'Hardware-aware Algorithm'?

[1] https://arxiv.org/pdf/2312.00752




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: