
Implementing Neural Turing Machines - godelmachine
https://arxiv.org/abs/1807.08518
======
orbifold
The author states in his github repository
([https://github.com/MarkPKCollier/NeuralTuringMachine](https://github.com/MarkPKCollier/NeuralTuringMachine))
that his work is based on
([https://github.com/snowkylin/ntm](https://github.com/snowkylin/ntm)). If
that is the case then I find it kind of strange that he has relicensed the
work to MIT from LGPL3 and removed any reference to the original author in the
LICENSE. LGPL3 requires any derivative work to be licensed the same and retain
mentions of the authors, more over it also requires a clear explanation of the
modifications that were undertaken.

Also compared to the open source implementation
([https://github.com/snowkylin/ntm](https://github.com/snowkylin/ntm)) it
seems like his main novel claim is that he looked at different memory
initialisation patterns.

Edit:

compare the original:
[https://github.com/snowkylin/ntm/blob/master/ntm/ntm_cell.py](https://github.com/snowkylin/ntm/blob/master/ntm/ntm_cell.py)

to the derivative work:
[https://github.com/MarkPKCollier/NeuralTuringMachine/blob/ma...](https://github.com/MarkPKCollier/NeuralTuringMachine/blob/master/ntm.py)

from what I can tell the main innovation is that the derivative work uses a
named tuple instead of a dictionary for state keeping and there is new memory
initialisation code. The original author apparently initialised the memory
randomly. I also feel like the paper should cite the implementation they are
basing their work on. The paper
[https://arxiv.org/pdf/1807.08518.pdf](https://arxiv.org/pdf/1807.08518.pdf)
merely states that other implementations exist on page one and makes no
mention of the fact that their implementation is based on one of those.
Combine that with the fact that they are asking people in the Readme to cite
their paper feels like not a very good idea.

~~~
markpkcollier
I am the author of this work. I was not aware of the difference in the
licenses and just used the GitHub default. I have updated it accordingly,
thanks for pointing this out.

A note on the difference between our work and Snowkylin's: the code
implementing the operations of the NTM are very similar and both are similar
to other open source NTM implementations as the operations for a NTM are
defined by the equations of the original NTM paper and have a clear mapping
into code.

The primary difference is that our implementation works and is stable - the
code changes to achieve this are minimal but this still required substantial
experimentation and work to figure out what was causing slow convergence and
the gradients becoming NaN (causing training to fail) in other
implementations. Thus our primary contribution is not to put a NTM into code
but to get that code to reliably and quickly train.

~~~
orbifold
Just to be clear, I went through the two implementations
([https://github.com/snowkylin/ntm/blob/master/ntm/ntm_cell.py](https://github.com/snowkylin/ntm/blob/master/ntm/ntm_cell.py))
and
([https://github.com/MarkPKCollier/NeuralTuringMachine/blob/ma...](https://github.com/MarkPKCollier/NeuralTuringMachine/blob/master/ntm.py))
line by line. The code is not just similar, but largely identical. You've used
the exact same variable names and ordering in the code. Some comments have
been removed. The notable differences I can spot are:

\- line 53-56 (in your code), which correspond to line 41-45 in Snowkylin's
code

\- line 70 vs 69, where you've used a build in function instead of the
explicit compositional form of softplus

\- finally the major change I can see are various variable initialisation
schemes 147-178 in your code, which is handled by line 157-185 in snowlkylin's
code

and some miscellaneous places where you've used a NamedTuple instead of a
dictionary. And yes the choice of initialisation is a valid improvement. As
far as I can tell Snowkylin's implementation is not broken, but just might
learn more slowly because of a different state initialisation scheme. It does
not suffer the NaN problem for example.

It should be communicated _very_ clearly also in the paper that the code you
are releasing is derived from another open-source implementation, with a
precise explanation of the changes you have made. Also you can't change the
LICENSE and copyright notice of open-source code to yourself without
permission. I believe even in the current form your Github repo is in breach
of the LGPL3. For more details how to use the LGPL properly, please refer to
[https://www.gnu.org/licenses/gpl-
howto.en.html](https://www.gnu.org/licenses/gpl-howto.en.html). Notice that
the original repository actually does not carry out all those steps properly.

Namely it is missing LICENSE and Copyright notices in every file. It also
doesn't have an explicit copyright notice anywhere.

~~~
andreyk
Indeed, if these are so similar your repo really ought to be a fork of the
prior repo, and you should certainly acknowledge them more in the paper (make
clearer what your contributions are).

------
YorkshireSeason
Important and unsurprising sentence from the paper's abstract: " _A number of
open source implementations of NTMs exist but are unstable during training and
/or fail to replicate the reported performance of NTMs_"

~~~
simias
Why do you deem it unsurprising? Do you think that the opensource neural
network implementations are subpar?

~~~
YorkshireSeason
Anecdotally: I've tried to replicate some recent AI/ML papers and failed. So
have some of my acquaintances.

------
a_d
“how the memory contents of a NTM are initialized may be a defining factor in
the success of a NTM implementation”

Does it mean that one can expect better explainability in the future from
these models?

~~~
albertzeyer
The result was that constant/zero initialization was the best, which is the
most natural choice anyway (for me at least), and definitely also the most
simple option. I'm a bit surprised that they emphasise so much on that. Also,
not really sure what to learn from this.

~~~
stochastic_monk
Does this correspond to an intuition that a tabula rasa may be the best way
for a learner to start?

------
yters
It'd be interesting to see a Neural TM that can increase its Kolmogorov
complexity.

------
carlc75
The associated github repo is at
[https://github.com/MarkPKCollier/NeuralTuringMachine](https://github.com/MarkPKCollier/NeuralTuringMachine)

