
Simplified Gating in Long Short-Term Memory (LSTM) Recurrent Neural   Networks - MichaelBurge
https://arxiv.org/abs/1701.03441
======
Smerity
To the Hacker News community, as I'm genuinely curious, does this link
contribute anything to you or your understanding of deep learning? 29 people
voted for it, zero comments, and it's still on the front page after five
hours. It's a very technical and very specific paper that has not yet had any
real analysis by the broader ML community. From the previous discussions on
deep learning, I generally suspect that seeing "neural networks" is an upvote
trigger but then very little good discussion continues past that.

I work in the field and I might read this later - but that's honestly only a
might. The datasets they examine aren't particularly impactful or interesting
and the paper is preliminary.

MNIST is a standard complained about dataset in vision (with someone recently
noting it's more a unit test than a benchmark) but is infrequently used as a
dataset for RNNs, other than potentially permuted MNIST which isn't used here.
The IMDb dataset is at least standard for RNNs but also not representative of
the complexity of many sequence tasks.

The primary statement being made is that the simpler LSTM1/2/3 can achieve
similar numbers to that of the LSTM when using a proper hyper-parameter
search. That's potentially useful to know but also likely not the thing
stopping practitioners from putting such work into the field. Many
architectures are also limited in the number of times they can be trained due
to computational restrictions - otherwise I'd usually strongly suggest hyper-
parameter search!

If people are interested in this type of analysis over RNN architectures, I
recommend the older but still useful "An Empirical Exploration of Recurrent
Network Architectures"[1]. The primary contribution there is that forget gates
should be set to 1 for LSTMs, which was used and then forgotten for many
years, but they do present various LSTM variants (MUT1/2/3) that are more
computationally efficient too. These were integrated into Keras (a Python
machine learning library) for some time. They also show their results over
more datasets (arithmetic, XML modeling, language modeling on PTB, and music
prediction) for a more convincing and nuanced discussion.

P.S. I'll note I saw this on Nando de Freitas' Twitter feed and assume that's
why it was posted here (given he's a Professor of Computer Science at the
University of Oxford and a senior researcher at Google). A retweet doesn't
constitute an endorsement though, especially in science. I'm still confused as
to why Hacker News, a very general crowd in tech, care particularly for one
deep learning paper and not another. Color me confused :)

[1]:
[https://research.google.com/pubs/pub45473.html](https://research.google.com/pubs/pub45473.html)

~~~
jacek
This should not be on HN for two reasons:

1) Only people how already know and used RNN could be interested

2) It is not a good paper

Let me explain. MNIST is not a good dataset for testing LSTMs. LSTMs were
designed to handle sequential data. Of course you can use them with MNIST, but
simple feed-forward network would be much easier and faster to train. RNNs
give no advantage here. IMDb dataset is a decent choice for tests, but there
is no convincing argument (or result) that makes LSTM1/LSTM2/LSTM3 a good
choice. "Best accuracies of different LSTMs" table tells us absolutely
nothing! You can just cherry-pick best results ignoring others. Other thing is
authors didn't test GRU. GRUs are based on LSTM design, but simplified a bit.
GRUs are known to perform better in many cases and are faster to train.

The paper recommended by user Smerity is the best paper comparing RNN cell
architectures and their performance I know and I recommend it too ("An
Empirical Exploration of Recurrent Network Architectures")

~~~
hcrisp
A better dataset for testing RNNs in general and LSTMs in particular would be
the UCR Time Series Classification Archive [0].

[0]
[http://www.cs.ucr.edu/~eamonn/time_series_data](http://www.cs.ucr.edu/~eamonn/time_series_data)

