Is Unlimited Context Length really possible? At what cost?
Amanda Bertsch, author of 2023 NeurIPS paper Unlimiformer, will describe the architecture and take questions at this Friday's Oxen.ai Paper Club.
Greg Schoeninger u/FallMindless3563, Oxen CEO and Master of Plain Speak, will help interp the concept and relate it to other papers we have reviewed.
Call: https://oxen.ai/community
The trick asserted to make Unlimited Context Length possible: Offload the cross attention calc to a K-Nearest Neighbors (K-NN) index.
I tweeted someone's clever animation of K-NN here: https://x.com/ParallaxAngle/status/1817672116243972287
Paper: https://arxiv.org/abs/2305.01625
My first 5 questions. I've only read the abstract so far.
1) How is there not a massive performance cost to doing the KNN neighbors calculation for each cross attention calc.
2) How do you pick your k value?
3) If Unlimiformer works, does this mean Retrieval Augmented Generation (RAG) is no longer necessary?
4) What are other important but less obvious implications of unlimited context length?
5) What is something that would surprise most people about
a) Successfully having a paper accepted into the NeurIPS conf
b) Attending the NeurIPS conference itself.
From a podcast it sounded insane to be there.