kogold's comments

kogold · 2026-04-05T11:46:06 1775389566

Let me rephrase that for you:

"Interesting idea! Token consumption sure is an issue that should be addressed, and this is pretty funny too! However, I happen to have an unproven claim that tokens are units of thinking, and therefore, reducing the token count might actually reduce the model's capabilities. Did anybody using this by chance notice any degradation (since I did not bother to check myself)?"

Have a nice day!

dang · 2026-04-05T18:54:36 1775415276

"Don't be snarky."

https://news.ycombinator.com/newsguidelines.html

Chance-Device · 2026-04-05T13:12:44 1775394764

Let’s see, I think these pretty much map out a little chronology of the research:

https://arxiv.org/abs/2112.00114 https://arxiv.org/abs/2406.06467 https://arxiv.org/abs/2404.15758 https://arxiv.org/abs/2512.12777

First that scratchpads matter, then why they matter, then that they don’t even need to be meaningful tokens, then a conceptual framework for the whole thing.

bsza · 2026-04-05T13:51:06 1775397066

I dont’t see the relevance, the discussion is over whether boilerplate text that occurs intermittently in the output purely for the sake of linguistic correctness/sounding professional is of any benefit. Chain of thought doesn’t look like that to begin with, it’s a contiguous block of text.

Chance-Device · 2026-04-05T14:30:07 1775399407

To boil it down: chain of thought isn’t really chain of thought, it’s just more token generation output to the context. The tokens are participating in computations in subsequent forward passes that are doing things we don’t see or even understand. More LLM generated context matters.

bitexploder · 2026-04-05T13:59:09 1775397549

That is not how CoT works. It is all in context. All influenced by context. This is a common and significant misunderstanding of autoregressive models and I see it on HN a lot.

j16sdiz · 2026-04-05T13:58:00 1775397480

I don't see the relevance -- and casually dismiss years of researches without even trying to read those paper.

bitexploder · 2026-04-05T13:58:20 1775397500

That "unproven claim" is actually a well-established concept called Chain of Thought (CoT). LLMs literally use intermediate tokens to "think" through problems step by step. They have to generate tokens to talk to themselves, debug, and plan. Forcing them to skip that process by cutting tokens, like making them talk in caveman speak, directly restricts their ability to reason.

ShowalkKama · 2026-04-05T12:09:09 1775390949

the fact that more tokens = more smart should be expected given cot / thinking / other techniques that increase the model accuracy by using more tokens.

Did you test that ""caveman mode"" has similar performance to the ""normal"" model?

Garlef · 2026-04-05T12:18:39 1775391519

Yes but: If the amount is fixed, then the density matters.

A lot of communication is just mentioning the concepts.

bitexploder · 2026-04-05T14:46:08 1775400368

That is part of it. They are also trained to think in very well mapped areas of their model. All the RHLF, etc. tuned on their CoT and user feedback of responses.

ano-ther · 2026-04-05T13:47:46 1775396866

Looking at the skill.md wouldn’t this actually increase token use since the model now needs to reformat its output?

Funny idea though. And I’d like to see a more matter-of-fact output from Claude.

collingreen · 2026-04-05T18:57:07 1775415427

I assume you're a human but wow this is the type of forum bot I could really get behind.

Take it a step further and do kind of like that xkcd where you try to post and it rewrites it like this and if you want the original version you have to write a justification that gets posted too.

Chef's kiss

mynegation · 2026-04-05T12:12:19 1775391139

No, let me rephrase it for you. “tokens used for think. Short makes model dumb”

freehorse · 2026-04-05T13:03:34 1775394214

Talk a lot not same as smart

taneq · 2026-04-05T14:12:36 1775398356

Think before talk better though

freehorse · 2026-04-05T15:19:33 1775402373

Think makes smart. But think right words makes smarter, not think more words. Smart is elucidate structure and relationships with right words.

ben_w · 2026-04-05T17:44:15 1775411055

think make smart, llm approximate "think" with context, llm not smart ever but sometimes less dumb with more word

estearum · 2026-04-05T13:00:10 1775394010

Can't you know that tokens are units of thinking just by... like... thinking about how models work?

xpe · 2026-04-05T13:33:02 1775395982

> Can't you know that tokens are units of thinking just by... like... thinking about how models work?

Seems reasonable, but this doesn't settle probably-empirical questions like: (a) to what degree is 'more' better?; (b) how important are filler words? (c) how important are words that signal connection, causality, influence, reasoning?

estearum · 2026-04-05T13:54:19 1775397259

Right, there's probably something more subtle like "semantic density within tokens is how models think"

So it's probably true that the "Great question!---" type preambles are not helpful, but that there's definitely a lower bound on exactly how primitive of a caveman language we're pushing toward.

gchamonlive · 2026-04-05T13:09:53 1775394593

Can't you just know that the earth is the center of the world by... like... just looking at how the world works?

estearum · 2026-04-05T13:37:58 1775396278

Actually you'd trivially disprove that claim if you're starting from mechanistic knowledge of how orbits work, like how we have mechanistic knowledge of how LLMs work.

gchamonlive · 2026-04-05T13:51:04 1775397064

You have empirical observations, like replicating a fixed set of inner layers to make it think longer, or that you seem to have encode and decode layers. But exactly why those layers are the way they are, how they come together for emergent behaviour... Do we have mechanistic knowledge of that?

ben_w · 2026-04-05T17:48:56 1775411336

I think we've *only* got the mechanism, not the implications.

Compare with fluid dynamics; it's not hard to write down the Navier–Stokes equations, but there's a million dollars available to the first person who can prove or give a counter-example of the following statement:

  In three space dimensions and time, given an initial velocity field, there exists a vector velocity and a scalar pressure field, which are both smooth and globally defined, that solve the Navier–Stokes equations.

- https://en.wikipedia.org/wiki/Navier–Stokes_existence_and_sm...

xpe · 2026-04-05T15:23:02 1775402582

Though the above exchange felt a tiny bit snarky, I think the conversation did get more interesting as it went on. I genuinely think both people could probably gain by talking more -- or at least figuring out a way to move fast the surface level differences. Yes, humans designed LLMs. But this doesn't mean we understand their implications even at this (relatively simple) level.