Hacker News new | past | comments | ask | show | jobs | submit login

context length doesn't unlock intelligence, just more information. e.g. adding info that wasn't part of training (or wasn't heavily weighted in training).



> context length doesn't unlock intelligence, just more information

This isn't correct (for most definitions of "unlock intelligence"). In-Context Learning (ICL) can do everything off-line training can do.

> We find that both Reinforced and Unsupervised ICL can be quite effective in the many-shot regime, particularly on complex reasoning tasks. Finally, we demonstrate that, unlike few-shot learning, many-shot learning is effective at overriding pretraining biases and can learn high-dimensional functions with numerical inputs.

https://arxiv.org/abs/2404.11018


Right, I should have been more clear than the words "active intelligence." But as one use of this... would unlimited, say 1 to 10 billion tokens of context, used as a system prompt, with "just" 32k left for the user, allow a model to be updated every day, in between actual training? (This is in the future, where training only takes a month, or much less.)

I guess part of what I really don't understand is how context tokens compare to training weights, as far as value to the final response. Would a giant context window muddle the value of weights?

(Maybe what I am missing is the human-feedback on the training weights? If the giant system prompt I am imagining is garbage, then that would be bad.)


In-context learning (ICL) is a thing. People aren't entirely sure how it works[1].

LLMs are very effective at few-shot learning via so yes, for all practical purposes yes, large context windows do allow for continuous learning.

Note that the context needs to be loaded and processed on every request to the LLM though - so all that additional information has to be "retaught" each time.

[1] https://openreview.net/pdf?id=992eLydH8G "These results indicate that the equivalence between ICL (ed: In-context-learning) and GD (ed: Gradient Descent) is an open hypothesis, requires nuanced considerations, and calls for further studies.


Thank you so much for your response. It's amazing what typing a little acronym like "ICL" can do as far as sharing knowledge. This is so cool!

Also, your link appears to exactly address my question. It's late here, but I am very excited to do my best at understanding that paper in the morning.


I mean, working memory is an aspect of intelligence.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: