
A new model and dataset for long-range memory - atg_abhishek
https://deepmind.com/blog/article/A_new_model_and_dataset_for_long-range_memory
======
cs702
Another great blog post on great research by the DeepMind guys, who are also
simultaneously releasing a new dataset for long-range language modeling.

The post is worth reading in its entirety.

If I may summarize, the authors propose a transformer augmented with a short-
term memory mechanism (analogous to TransformerXL) as well as a long-term
memory mechanism (new) that learns to 'compress and memorize' embeddings from
the short-term memory. The model is trained on book-length samples (!!!!), and
seems to perform significantly better than prior models at generating language
with long-range contexts. To my eyes, text generated by the trained model is
virtually indistinguishable from human output, and qualitatively superior to
GPT2 samples.

~~~
JRKrause
Agreed that the generated sample is superior to similar outputs from GPT-2.
Looking at the additional samples in the publication, my first thought is that
the model cannot easily stray from or modify the context. Once a fact is
stored within the compressed memory, it seems the model cannot easily generate
sentences contradictory to that fact. This is problematic because frequent
changes to relational information (e.g. the location a character is standing)
is fundamental to story telling.

------
ColanR
So is there a place to download the trained model? I don't see anything but
the dataset available.

~~~
gwern
They probably won't since DM doesn't open-source most of its work. The authors
claimed way back in November that they'd at least open-source the code
([https://openreview.net/forum?id=SylKikSYDH](https://openreview.net/forum?id=SylKikSYDH))
but nothing yet. (The model isn't so big that open-sourcing it is all _that_
important. It's no Turing-NLG [https://www.microsoft.com/en-
us/research/blog/turing-nlg-a-1...](https://www.microsoft.com/en-
us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-
microsoft/) that's for sure!)

In the mean time, there's always Reformer, which has Trax and PyTorch
implementations.

~~~
hooande
I believe the transformer-xl pre-trained model can also be downloaded, to
provide a similar long term memory functionality as the compression
transformer. I don't have a direct link, but it's available via huggingface
[https://huggingface.co/transformers/pretrained_models.html](https://huggingface.co/transformers/pretrained_models.html)

~~~
gwern
Yeah. I didn't mention Transformer-XL because I'm not sure how much of a long-
range dependency it actually learns to handle. The only papers I've seen on
recurrency indicate that they tend to learn very short-range dependencies,
while something like Reformer with direct access to thousands of timesteps
seems more likely to actually be making use of them.

------
ganzuul
From the description in the research paper of how they compress the memory it
sounds like a form of meta-learning.

Perhaps a network like this would be interested in reading the same books more
than once. Perhaps it could find favorite books it wanted to read many times.

~~~
nloladze
So the simple of it, is that it links parts of memory together? Just like
human memory? Trying to keep the most valid parts of it together.

~~~
ganzuul
In principle a feature of compression is exactly this. Lots of potential in
this space.

------
zackmorris
Thank you, this just made a huge connection for me between the role of sleep,
memory, and its role in decision making (in the "consolidated episodic
memories" link):

[https://www.ncbi.nlm.nih.gov/pubmed/28641107](https://www.ncbi.nlm.nih.gov/pubmed/28641107)

I was suffering from sleep apnea at this time last year and was on call 1 out
of every 3 weeks so was not defragging my brain's hard drive. I got decision
fatigue and my productivity fell to 10%, which led to me being unable to work
for several months.

