Transformer Memory as a Differentiable Search Index

kettleballroll · on Feb 16, 2022

Speaking as an academic for a minute, that this works was already known since "hopfield networks is all you need". That paper even has some theoretical work on how much each layer can store, IIRC. It's weird that they didn't cite that paper though it's a clear extension of that.

orbifold · on Feb 16, 2022

ML in general seems to have a really low standard for citing related and earlier work. In part I think that is because a lot of papers are based on relatively poorly developed foundations or empirical evidence. A prime example are the "Neural ... " papers, which in my view largely took existing research in the optimal control / differential equations literature, prepended the word "Neural" and provided some "interesting" application. This would be fine, if there was some proper context given for the work, but unfortunately that wasn't always the case there.

blackbear_ · on Feb 16, 2022

A bit disappointed they do not compare with, or at least mention, the memory mechanism of neural turing machines [1].

[1] https://arxiv.org/abs/1410.5401

iamjbn · on Feb 16, 2022

Thanks for sharing. I have been working on vector search engines (IR) for a while (Aquila Network). And I believe works such as this as well as the "differential neural computers" (couple of them from Deepmind) will be the next breakthrough in IR. I can't see the direct path yet. We're eagerly waiting to see somewhat a usable architecture yet. I believe current vector indexes will eventually get modified into hierarchical random access memories that stores compressed information in higher dimensions (static & replicated part of the distributed system). On top of this, an application specific information decoder (that's the dynamic part of the system, UX) will use the underlying information accordingly.

svcrunch · on Feb 16, 2022

This is fascinating, as far as pushing the creative boundaries of how to accomplish IR with neural networks, and showing the potential headroom that's available.

It seems wildly impracticable to productionize at the moment (excepting Google, perhaps). If I'm not misunderstanding, the index is actually built by training a neural network (they use networks ranging from 250M-11B parameters, i.e. 500MB-22GB in size). Still, for an important collection of documents, this might be how it's done in a few years.

bmc7505 · on Feb 16, 2022

> It seems wildly impracticable to productionize at the moment (excepting Google, perhaps).

There were a couple related papers [1, 2] from Google on learning to index a few years ago. In fact, I'm a little surprised they weren't cited.

[1]: https://dl.acm.org/doi/pdf/10.1145/3183713.3196909

[2]: https://arxiv.org/pdf/2012.12501.pdf

elcomet · on Feb 16, 2022

This is very different from learning to index I think. This is about storing data in the weights of a model.

touisteur · on Feb 16, 2022

Ever since I learned about and implemented a learned index I've been seeing the power of these new things. Don't care much for computer vision, but generic performance improvement primitives, yes please.

vsroy · on Feb 16, 2022

Can someone provide a layman's explanation?

periheli0n · on Feb 16, 2022

A transformer network is a variant of associative memory. You give it a query and it returns a value that it has learned to associate with the query during training.

svcrunch · on Feb 16, 2022

> You give it a query and it returns a value that it has learned to associate with the query during training.

The zero-shot scenario they describe does not work like this. They explicitly mention that it's not trained with any queries (which is what makes it a very promising technique).

periheli0n · on Feb 16, 2022

Sorry for the imprecise phrasing. During training, patterns are stored in the network parameters. The query will naturally be similar to one or several of the patterns stored, which are then returned (in hopfield-speak "completed").

Irishsteve · on Feb 16, 2022

The idea is certainly interesting. But given the dependency on an underlying query to doc click log it's not the most obvious solution.

Inverted indices with fuzzy and boolean matching are pretty boring, but they're still pretty trivial to stand up and do not require anything beyond the actual corpora to be built.

throwaway81523 · on Feb 16, 2022

This is a really cool paper that I better sit down and read. Thanks for posting it.

ignoramous · on Feb 16, 2022

html: https://www.arxiv-vanity.com/papers/2202.06991/