Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My understanding that LSTM is a kind of RNN.


Yes it is. They were developed to fix the vanishing gradient problem.

The 1997 paper where they were introduced puts it like this:

Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient-based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

https://www.researchgate.net/publication/13853244_Long_Short...

Usually they aren't competitive with transformers on long-range understanding problems though.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: