
Understanding LSTM Networks (2015) - mkagenius
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
======
rahimnathwani
If you like this and Chris' other posts, don't forget to check out
[https://distill.pub/](https://distill.pub/) which he co-founded. Such high
quality explanations.

------
blueyes
This is also a good one:
[https://deeplearning4j.org/lstm](https://deeplearning4j.org/lstm)

------
stared
It's a great text, but I recommend supplementing it with:
[http://blog.echen.me/2017/05/30/exploring-
lstms/](http://blog.echen.me/2017/05/30/exploring-lstms/).

Additionally, GRU may be simpler to understand memory unit.

------
amelius
Too hand-wavy for my taste. I have the feeling I still don't know why they
work.

~~~
senatorobama
Agreed. Don't know why this article gets high praise, I'm sure the pretty
pictures help.

~~~
colah3
Hi! I'm the author. I was also pretty surprised by the amount of attention
this article has gotten. That said, I can give you my take on how it advanced
the dialogue.

As you note, the main contribution of this article is the diagrams -- but I
think it's a bit deeper than being pretty. Previously, diagrams of LSTMs had
lots of loops in them. This made them hard to understand. (In fact, they were
ambiguous because of unclear execution order.) The diagrams in this article
unrolled the loops, which seems to be a much easier way to reason about LSTMs.
They also work at the abstraction of layers instead of weights and matrix
multiplies, which make the diagram less noisy and focuses the reader on the
important ideas.

Since diagrams were previously challenging to consume and understand,
explanations prior to my article tended to focus on the six equations than
define an LSTM. While that's a relatively clear way to think about LSTMs once
you've really internalized them, it can be very hard when you are trying to
understand them for the first time. In particular, the equations introduce ~15
variables. If you aren't already familiar with those variables, it's extremely
challenging to keep them all in your working memory and understand what's
going on.

In contrast, the diagrams don't require a new reader to hold lots of things in
working memory, and visually hint a great deal of the narrative (like the
horizontal line at the top).

------
rahulchhabra07
Had been my first reading on LSTM and was so smooth and lucid. I did need to
read more literature and code, but a great resource to begin with.

------
nafizh
prev discussion:
[https://news.ycombinator.com/item?id=10130341](https://news.ycombinator.com/item?id=10130341)

