
Visualizing Memorization in RNNs - stared
https://distill.pub/2019/memorization-in-rnns/
======
selimonder
Great post. Long sequences are a problem I have been dealing with long enough
now. Even though GRU outperforms LSTM in longer sequences, it still isn't
enough and vanishing gradients problem is still experienced. Currently, one-
dimensional convolution units are effective while dealing with longer time
steps, but sadly it's not the precise unit people wish to use while processing
longer time-steps.

