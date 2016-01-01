I work in the field and I might read this later - but that's honestly only a might. The datasets they examine aren't particularly impactful or interesting and the paper is preliminary.
MNIST is a standard complained about dataset in vision (with someone recently noting it's more a unit test than a benchmark) but is infrequently used as a dataset for RNNs, other than potentially permuted MNIST which isn't used here. The IMDb dataset is at least standard for RNNs but also not representative of the complexity of many sequence tasks.
The primary statement being made is that the simpler LSTM1/2/3 can achieve similar numbers to that of the LSTM when using a proper hyper-parameter search. That's potentially useful to know but also likely not the thing stopping practitioners from putting such work into the field. Many architectures are also limited in the number of times they can be trained due to computational restrictions - otherwise I'd usually strongly suggest hyper-parameter search!
If people are interested in this type of analysis over RNN architectures, I recommend the older but still useful "An Empirical Exploration of Recurrent Network Architectures"[1]. The primary contribution there is that forget gates should be set to 1 for LSTMs, which was used and then forgotten for many years, but they do present various LSTM variants (MUT1/2/3) that are more computationally efficient too. These were integrated into Keras (a Python machine learning library) for some time. They also show their results over more datasets (arithmetic, XML modeling, language modeling on PTB, and music prediction) for a more convincing and nuanced discussion.
P.S. I'll note I saw this on Nando de Freitas' Twitter feed and assume that's why it was posted here (given he's a Professor of Computer Science at the University of Oxford and a senior researcher at Google). A retweet doesn't constitute an endorsement though, especially in science. I'm still confused as to why Hacker News, a very general crowd in tech, care particularly for one deep learning paper and not another. Color me confused :)
[1]: https://research.google.com/pubs/pub45473.html
1) Only people how already know and used RNN could be interested
2) It is not a good paper
Let me explain. MNIST is not a good dataset for testing LSTMs. LSTMs were designed to handle sequential data. Of course you can use them with MNIST, but simple feed-forward network would be much easier and faster to train. RNNs give no advantage here. IMDb dataset is a decent choice for tests, but there is no convincing argument (or result) that makes LSTM1/LSTM2/LSTM3 a good choice. "Best accuracies of different LSTMs" table tells us absolutely nothing! You can just cherry-pick best results ignoring others. Other thing is authors didn't test GRU. GRUs are based on LSTM design, but simplified a bit. GRUs are known to perform better in many cases and are faster to train.
The paper recommended by user Smerity is the best paper comparing RNN cell architectures and their performance I know and I recommend it too ("An Empirical Exploration of Recurrent Network Architectures")
[0] http://www.cs.ucr.edu/~eamonn/time_series_data
Second though, I've sadly stopped responding (or even reading) many of the machine learning and especially broader artificial intelligence posts on Hacker News. The discussions are rarely discussions that one can contribute to constructively meaning the time spent rarely provides a positive return. My primary social channel for ML/AI is Twitter - a great community but a depressingly inadequate tool for such discussions. For many other topics however Hacker News is still my goto community.
It's a shame as I love to discuss machine learning and artificial intelligence but, amongst other things, there's a Godwin style law that these discussions inevitably lead to comparisons to the human brain and/or the singularity and/or killer robots and/or "no bias" in machine learning.
On the other hand, I think there's also enough people on HN who know enough about those subjects, that the comments that get the most upvotes (and, consequently, rise to the top) are the ones that make the most sense.
Edit: to be fair, all the stuff about brains and singularities is not the fault of HN users. Very prominent researchers (Hinton, Schmidhuber, others) keep pushing the "based on the rain" line, for instance.
I'm a little surprised it got so many votes, too. I would've liked to see discussion on the Spatial Transformer Networks paper I posted yesterday, which seemed much more useful.
