Hacker News new | past | comments | ask | show | jobs | submit login

Handling state (especially long-term) is really a struggle for LLMs right now. This issue should become easier to work with as context windows scale up in the next couple years (or months, who knows!).



People are already making progress on this, e.g. the H3 project[1].

[1] https://arxiv.org/abs/2212.14052


This is the most excited I've ever been sequence models! If the claims the H3 (and S4) authors are true then we are on the cusp of something very big that will provide another quantum leap in LLM performance. I worth that the claims may come with a hidden catch, but we just have to work with these systems to know.

I'll venture that once truly long range correlations can be managed (at scales 100-1000x what's possible with current GPTs), all the issues about logical reasoning can be answered by training on the right corpus and applying the right kinds of human guided reinforcement.


Google scaled context to 40K tokens


Using tokens as context still sounds to me like you're asking someone to read back text that someone else wrote and continue the story. It might work but it's not the best way to get a coherent narrative.


How can you have a coherent narrative if you can't link things across very large contexts?


I'm saying the context should consist of more than just tokens.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: