60k is tiny, if it's making recall mistakes that early then you might have some ...

Bolwin · 2026-06-14T14:46:30 1781448390

I don't use Claude Code. I use my own handwritten agent (formerly using Pi) and know every token that goes into it. There are zero memories to confuse it. The system prompt is 200 tokens and completely self consistent.

Plus I've found that the only time models go above 100k tokens anyway is when they've started looping at which point it's much better to go back anyway.

Anecdotally most models know their recall is terrible (or have been trained to act as such), that's why they constantly reread files before editing or while reasoning.

danielbln · 2026-06-14T08:16:12 1781424972

Yeah 60k is ludicrous, I've barely seeded the context at that point and I don't see context related degradation until well into the 600-700k.

qsera · 2026-06-14T09:31:31 1781429491

In this thread: People tossing coins independently and fighting over the result they got.

kuboble · 2026-06-14T11:25:49 1781436349

No it's not.

It seems that people have different workflows or repos, or memories or prompts or expectations.

diab0lic · 2026-06-14T11:55:36 1781438136

For what it’s worth, as a third party I read your and qsera’s comments as saying the same thing.

kuboble · 2026-06-14T14:43:04 1781448184

Maybe I misread the comment then.

I read it as a models performance being random and observed differences in the opinions are the results of the overinterpretation of the random outcomes.

I think however that some people seem to be always lucky which indicates that it is not random but rather some fixed differences between people and their environments.

qsera · 2026-06-15T02:38:42 1781491122

A models adherence to some configuration is a matter of probability. There might be some underlying pattern, but as far as I understand this is not documented and it may be even impossible to do so. So people are just trying stuff and sharing what appear to work. There is no causal link anywhere in this recommendations, and is just based on spurious correlations.

embedding-shape · 2026-06-14T09:23:34 1781429014

> I've barely seeded the context at that point

I think that's issue, rather than 60K being small.

Most of the actual edits/changes I request to codex are solved within 100-150K tokens, beyond 200K I'd definitively try to restart the session as soon as I could as all models are horrible once you get across ~20% of the total context size. And this is while working on +million LOC codebases.

Problem I guess is that there is no solid and concrete evidence of this (to me [and others seemingly] obvious) degradation, but should be easy to prove, yet no one has time to sit down and show it :)

But the likelihood of a model getting minor details wrong once you're above some magical threshold between 15-20%, seems to skyrocket, and I hit that issue sufficient amount of times that now my workflow is trying to prevent that.

rtpg · 2026-06-14T12:20:08 1781439608

what are y'all doing to hit that? Do you just not give it any pointers and let it churn away? What kind of context are you handing off?

I routinely get claude to do things pretty decently and finish up easily in the 4-5 digit range of tokens. It seems to be doing the right kind of thing to not waste its time looking at 1000 files.

da_grift_shift · 2026-06-14T08:18:48 1781425128

>you might have some false memories or incorrect instructions in your CLAUDE.md

    "YOU'RE HOLDING IT WRONG!"

RugnirViking · 2026-06-14T08:41:54 1781426514

did you internalize what was wrong with that quote when it was said? does it apply here?