Speaking as an academic for a minute, that this works was already known since "hopfield networks is all you need". That paper even has some theoretical work on how much each layer can store, IIRC. It's weird that they didn't cite that paper though it's a clear extension of that.
ML in general seems to have a really low standard for citing related and earlier work. In part I think that is because a lot of papers are based on relatively poorly developed foundations or empirical evidence. A prime example are the "Neural ... " papers, which in my view largely took existing research in the optimal control / differential equations literature, prepended the word "Neural" and provided some "interesting" application. This would be fine, if there was some proper context given for the work, but unfortunately that wasn't always the case there.
Thanks for sharing. I have been working on vector search engines (IR) for a while (Aquila Network). And I believe works such as this as well as the "differential neural computers" (couple of them from Deepmind) will be the next breakthrough in IR. I can't see the direct path yet. We're eagerly waiting to see somewhat a usable architecture yet. I believe current vector indexes will eventually get modified into hierarchical random access memories that stores compressed information in higher dimensions (static & replicated part of the distributed system). On top of this, an application specific information decoder (that's the dynamic part of the system, UX) will use the underlying information accordingly.
This is fascinating, as far as pushing the creative boundaries of how to accomplish IR with neural networks, and showing the potential headroom that's available.
It seems wildly impracticable to productionize at the moment (excepting Google, perhaps). If I'm not misunderstanding, the index is actually built by training a neural network (they use networks ranging from 250M-11B parameters, i.e. 500MB-22GB in size). Still, for an important collection of documents, this might be how it's done in a few years.
Ever since I learned about and implemented a learned index I've been seeing the power of these new things. Don't care much for computer vision, but generic performance improvement primitives, yes please.
A transformer network is a variant of associative memory. You give it a query and it returns a value that it has learned to associate with the query during training.
> You give it a query and it returns a value that it has learned to associate with the query during training.
The zero-shot scenario they describe does not work like this. They explicitly mention that it's not trained with any queries (which is what makes it a very promising technique).
Sorry for the imprecise phrasing. During training, patterns are stored in the network parameters. The query will naturally be similar to one or several of the patterns stored, which are then returned (in hopfield-speak "completed").
The idea is certainly interesting. But given the dependency on an underlying query to doc click log it's not the most obvious solution.
Inverted indices with fuzzy and boolean matching are pretty boring, but they're still pretty trivial to stand up and do not require anything beyond the actual corpora to be built.