Hacker News new | past | comments | ask | show | jobs | submit login
Unbounded Context with Memory
1 point by codelion 52 days ago | hide | past | favorite | 1 comment
Recently, Google released "FRAMES"(https://huggingface.co/datasets/google/frames-benchmark) a benchmark designed to test Retrieval-Augmented Generation (RAG) applications on Factuality, Retrieval Accuracy, and Reasoning.

The benchmark didn't come with an evaluation script, so we first implemented that in optillm - https://github.com/codelion/optillm/blob/main/scripts/eval_frames_benchmark.py

I had implemented a memory plugin (https://github.com/codelion/optillm/blob/main/optillm/plugins/memory_plugin.py) in optillm for adding short-term memory and unbounded context to LLMs. We used FRAMES to evaluate the memory plugin with Gemma2 model from Google. Gemma2 has a context window of 8192 so, in the paper when Google reported the results they only reported it for naive prompt which doesn't include the text retrieved via RAG.

However, by using the memory plugin in optillm we can make the context of any LLM to be unbounded. We managed to boost the accuracy to 30.1% v/s 5.1% as reported by Google in the paper. Also, we were able to get almost the same accuracy as Gemini with just gpt-4o-mini using optillm memory even though gpt-4o-mini has a context window that is 1/10 that of Gemini. I also ran into a very interesting refusal from Gemini to answer one of the queries, you can see the prompt here - https://aistudio.google.com/app/prompts/13PYnnu6UpukanIen9ClaFKgTI3lgPi88

The prompt just contains text from the wikipedia page of https://en.wikipedia.org/wiki/Dwight_Schrute the office character which is part of the benchmark as one of the queries - https://huggingface.co/datasets/google/frames-benchmark/viewer/default/test?q=dwight_schrute&row=57

All the Gemini models refuse to answer the query and just block the response even with safety settings all set to None. I am not sure how Google ran the evals for their benchmark without this issue.




what are you even trying to say dude?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: