How can you chat with a pdf which doesn’t fit in the context window? I mean with...

htsh · on Dec 10, 2023

That is what the RAG system does. The PDF is chunked and thrown into a vector store. And then when prompted, only the relevant bits are retrieved and stuffed into the context and sent to the LLM.

So yeah it's kinda smoke and mirrors. In some cases, for some long PDFs, it works really well. If it's a 500 page PDF with many disparate topics, it may do fine.

freedomben · on Dec 10, 2023

Indeed. Would only add, context windows are continually multiplying in size. Who knows how long Moore's Law will apply here, but it's a continually improving window.

saberience · on Dec 10, 2023

I've found that the longer context windows don't seem to be a linear improvement in responses though. It's like the longer the context window, the quality of the response is perhaps broader, but less sharp or accurate. I've been using GPT4-turbo with the longer context window for coding tasks but it doesn't seem to have improved the responses as much as you would think, it seems to be more "distracted" now, which perhaps makes some intuitive sense.

I can give gpt4-turbo many full code files to try and solve a complex coding task but despite the larger window it seems to fail more often or ignore parts of the context window or just doesn't really answer the question.

saberience · on Dec 10, 2023

That assumes that only one part of the PDF, which fits in the context window, is relevant to the prompt, which seems like a fairly big assumption.