Thanks, I was wondering about this. Has this been your experience across many models, universally? Or, are some worse than others at what I just learned is called In-Context Learning?
This has been my experience across all of them, yes. Especially when I ask it to select a decently-sized subset of the text I pass in, as opposed to just doing a needle-in-the-haystack type thing.
Recall ability varies quite a bit. GPT-4-Turbo's recall becomes quite spotty as you reach 30-50k tokens, whereas Claude-3 has really good recall over the entire context window.